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Abstract 

Advances  in  microsensors,  microprocessors  and  microdisplays  are  creating  new  opportunities  for 
improving  vision  in  degraded  environments  through  the  use  of  head-mounted  displays.  Initially, 
the  cutting-edge  technology  used  in  these  new  displays  will  be  expensive.  Inevitably  the  cost  of 
providing  the  additional  sensor  and  processing  required  to  support  binocularity  brings  the  value 
of  binocularity  into  question.  Several  assessments  comparing  binocular,  biocular  and  monocular 
head-mounted  displays  for  aided  vision  have  concluded  that  the  additional  performance,  if  any, 
provided  by  binocular  head-mounted  displays  does  not  justify  the  cost.  The  selection  of  a 
biocular  display  for  use  in  the  F-35  is  a  current  example  of  this  recurring  decision  process.  It  is 
possible  that  the  human  binocularity  advantage  doesn’t  carry  over  to  the  aided  vision  application, 
but  more  likely  the  experimental  approaches  used  in  the  past  have  been  too  coarse  to  measure  its 
subtle  but  important  benefits.  Evaluating  the  value  of  binocularity  in  aided  vision  applications 
requires  an  understanding  of  the  characteristics  of  both  human  vision  and  head-mounted 
displays.  With  this  understanding,  the  value  of  binocularity  in  aided  vision  can  be  estimated  and 
experimental  evidence  can  be  collected  to  confirm  or  reject  the  presumed  binocular  advantage, 
enabling  improved  decisions  in  aided  vision  system  design.  This  paper  describes  four 
computational  approaches;  geometry  of  stereopsis,  modulation  transfer  function  area  for 
stereopsis,  probability  summation  and  binocular  summation,  that  may  be  useful  in  quantifying 
the  advantage  of  binocularity  in  aided  vision. 

Key  Words:  Head  Mounted  Display,  Night  Vision  Goggle,  Stereopsis,  Modulation  Transfer 
Function,  Probability  Summation 
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Introduction 


Human  vision,  taken  in  its  entirety,  is  an  unparalleled  sensing  system  but  there  are  degraded 
environments  (low  light,  rain,  fog,  smoke)  where  vision  ean  be  greatly  improved  or  aided 
through  an  intermediary  eleetronie  sensor/display  system.  For  the  purpose  of  this  paper  an  aided 
vision  system  is  defined  as  a  head-mounted  display  (HMD)  that  produees  a  spatially  and 
temporally  aligned  image  of  the  operator’s  environment.  The  resulting  veridieal  pereeption 
allows  the  operator  to  navigate  and  manipulate  the  environment  in  a  natural  and  effieient 
manner.  Night  vision  goggles  (NVGs),  where  the  optieal  axes  of  a  unity  magnifieation  inline 
sensor/display  system  is  aligned  with  the  operator’s  visual  axes  are  an  extremely  suceessful 
example  of  an  aided  vision  system  for  low  light  environments. 

In  the  design  of  HMDs  for  aided  vision,  the  deeision  to  pursue  a  binoeular,  bioeular  or 
monoeular  system  is  based  on  a  multitude  of  factors.  Some  of  these  factors  such  as  weight,  size, 
power  and  cost  are  quantifiable  and  reasonably  well  understood.  Other  important  factors,  such 
as  the  various  parameters  of  visual  performance,  are  more  subtle,  situationally  dependent  and 
less  well-defined.  Numerous  studies  comparing  visual  performance  while  using  HMDs  of 
different  ocular  configurations  have  produced  inconsistent  results  (13,25,38).  Extrapolation 
from  these  studies  to  predict  visual  performance  using  new  HMD  systems,  or  older  systems  in 
new  environments,  should  be  done  with  caution  and  in  the  context  of  known  display  and  human 
visual  characteristics. 

The  human  visual  sense  is  derived  from  an  extremely  complex  neural  process.  Knowledge  of 
this  complex  process  is  derived  from  a  limited  number  of  experiments  conducted  under  a  limited 
number  of  conditions.  The  accumulated  knowledge  is  captured  in  simplified  models.  Models 
derived  from  one  set  of  conditions  can  provide  insight  into  other  conditions,  but  adjustments  to 
the  models  are  often  needed.  Using  this  strategy,  unaided  vision  models  can  serve  as  initial 
estimates  of  aided  vision  performance  and  point  toward  research  processes  that  can  extend 
current  models  to  better  predict  performance  using  new  aided  vision  systems. 

In  human  vision  the  retinal  image  is  processed  in  multiple  neural  streams.  Analyzing  the 
outcome  of  a  single  stream  does  not  provide  a  comprehensive  assessment  of  visual  performance. 
In  binocular  vision  the  independent  image  information  gathered  by  the  two  eyes  are 
simultaneously  compared,  to  extract  stereoscopic  depth  information,  and  summed  to  improve 
image  perception  (21).  An  assessment  of  the  value  of  binocularity  under  a  given  set  of 
conditions  needs  to  take  both  summation  and  comparison  streams  into  consideration.  It  is 
important  to  note  that  the  value  of  these  two  streams  may  be  negatively  correlated.  For  example, 
under  low  light  conditions  the  summation  stream’s  contribution  may  be  extremely  important  in 
collecting  sparse  information  from  the  scene  while  the  comparative  stream  provides  little  benefit. 
Increasing  the  scene  illuminance  changes  the  relative  contributions  of  the  streams.  In  abundant 
light  conditions  the  summation  stream’s  contribution  is  reduced  and  the  comparative  stream’s 
stereoscopic  contribution  is  maximized.  Spatial  discrimination,  a  critical  visual  skill,  is 
processed  through  both  streams.  There  are  three  types  of  spatial  discrimination  discussed  in  this 
article,  resolution.  Vernier  acuity,  and  stereoacuity.  The  first  two,  resolution  and  Vernier  acuity, 
benefit  from  the  summed  inputs  from  the  two  eyes,  while  stereoacuity  is  dependent  on  a 
comparison  of  these  inputs. 
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Resolution,  the  most  familiar  spatial  diserimination,  is  the  ability  to  distinguish  the  separation 
between  two  points  (or  lines)  and  is  routinely  measured  in  people  using  the  Snellen  eye  ehart. 

The  resolution  of  optieal  equipment  is  measured  using  tri-bar  and  sine -wave  eharts.  Vernier 
aeuity  and  stereoaeuity  are  less  well-known,  but  make  for  an  interesting  eomparison.  They  are 
both  diseriminations  of  misalignment.  They  both  depend  on  the  same  initial  physiology  to 
inelude  the  foeusing  opties  of  the  eye,  the  density  of  retinal  photoreeeptors,  and  the  size  of 
neuron  reeeptive  fields.  In  addition  they  ean  both  be  measured  on  the  same  experimental 
apparatus  under  nearly  identieal  eonditions;  the  only  differenee  being  the  orientation  of  the 
pereeived  misalignment  (4).  The  orientation  determines  whether  the  two  eyes  see  nearly 
identieal  views  that  ean  be  summed  (Vernier  aeuity)  or  disparate  views  that  must  be  eompared 
(stereoaeuity).  This  differenee  has  measurable  pereeptual  eonsequenees.  Under  optimal  viewing 
eonditions  Vernier  and  stereoaeuity  have  very  similar  values,  both  as  small  as  2  aresee  (2,20), 
but  as  eonditions  degrade  Vernier  and  stereoaeuity  values  diverge  with  Vernier  aeuity  being 
relatively  resilient  to  the  degradation  (4). 

Binoeularity  provides  a  definite  performanee  improvement  on  several  visual  faetors  ineluding 
visual  field,  depth  pereeption,  resolution,  deteetion,  and  reaetion  time.  Undoubtedly  these 
improvements  are  important,  but  experimental  evidenee  is  required  to  eonfirm  that  the  binoeular 
advantage  exists  for  any  given  applieation  of  a  binoeular  aided  vision  system.  Models  are 
needed  to  guide  this  researeh,  to  aid  in  interpreting  results,  and  to  faeilitate  the  aeeumulation  of 
the  aequired  knowledge.  Towards  this  goal,  four  potential  models  are  diseussed  in  this  paper. 
Geometry  of  Stereopsis  and  the  Modulation  Transfer  Funetion  for  Stereopsis  models  may  be 
useful  in  defining  the  value  of  the  eomparative  aspeets  of  binoeularity  in  aided  vision  systems. 
Probability  summation  and  its  progeny,  binoeular  summation,  may  have  predietive  value 
regarding  the  summation  aspeets  of  binoeularity  in  aided  systems. 

Geometry  of  Stereopsis 

In  the  geometry  of  stereopsis  model  there  are  four  main  eontributing  faetors  to  the  stereo  depth 
pereeption:  1)  the  distanee  between  the  observer’s  eyes,  often  ealled  the  interpupillary  distanee 
(P);  2)  the  distanee  from  the  eyes  to  the  fixation  point  (D),  also  known  as  the  vergenee  distanee; 
3)  the  distanee  between  the  fixation  point  and  the  objeet  of  interest,  and  4)  the  spatial 
diserimination  eapabilities  of  the  eyes.  The  first  three  faetors  define  the  geometrie  relationships 
between  the  relevant  items  to  viewing  eondition.  Cormaek  and  Fox  (12)  provide  a  thorough 
diseussion,  ineluding  equations,  of  this  geometry.  The  fourth  faetor  defines  the  envelope  of 
geometries  that  ean  be  pereeived  by  the  binoeular  visual  system.  Stereoaeuity  (S)  delineates  the 
minimum  pereeptible  depth  differenee  (MPDD)  boundary  of  the  envelope,  where: 

MPDD=P/{2[(P/2D)+Tan(S/2)]}  -D 

This  boundary  is  graphieally  represented  in  Figure  1  for  a  variety  of  stereoaeuities.  The  x-axis 
(abseissa)  is  fixation  distanee  and  the  y-axis  (ordinate)  is  the  minimum  distanee  a  seeond  objeet 
must  be  moved  in  from  the  fixation  point  for  the  observer  to  pereeive  that  it  is  eloser  based  on 
stereopsis.  In  this  figure  the  2  are-seeonds  (aresee)  eurve  represents  the  predieted  depth 
diserimination  eapabilities  of  an  observer  given  exeellent  stereoaeuity  under  optimal  visual 
eonditions.  The  additional  curves  show  the  predicted  level  of  depth  discrimination  from  a 
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selection  of  reduced  stereoacuity  levels  that  could  result  from  either  a  compromised  visual 
system  or  degraded  visual  conditions.  The  geometry  of  stereopsis  model  predicts  that  stereo 
depth  discrimination  is  best  when  viewing  objects  at  near  and  that  it  gradually  declines  with 
increased  viewing  distance,  with  the  rate  of  decline  being  heavily  dependent  on  stereoacuity. 

The  introduction  of  a  well-fit  binocular  NVG  in  front  of  the  eyes  changes  the  stereopsis  viewing 
geometry  little.  The  optical  axes  of  the  NVG  tubes  are  coincident  with  the  visual  axes  of  the 
observer’s  eyes  (i.e.,  the  interpupillary  distance  is  unchanged)  and  the  unity  magnification  of  the 
goggles  leaves  the  viewing  distance  undistorted.  In  other  words,  the  geometry  of  stereopsis 
model  should  apply  to  binocular  night  vision  goggles  and  other  similar  aided  vision  devices. 
Theoretically,  if  the  spatial  relationships  of  a  critical  depth  detection  task  were  known,  the  model 
could  be  used  to  estimate  the  binocular  HMD  stereoacuity  requirements  for  performing  the  task 
under  aided  vision  conditions.  Practically,  the  model  has  limited  value  without  a  method  for 
determining  the  stereoacuity  supported  by  a  binocular  HMD. 
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Figure  1,  Minimum  Perceptible  Depth  Difference  (MPDD)  by  Viewing  Distance  for  Several 
Stereoacuity  Levels 


Multiple  investigations  of  NVGs  using  a  Howard-Dolman  apparatus  have  concluded  that  under 
good  illumination,  using  high  contrast  targets,  NVGs  can  support  approximately  17  arcsec  of 
stereoacuity  (22).  17  arcsec  is  significantly  less  than  the  optimal  stereoacuity  for  unaided  vision. 
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but  Figure  1  suggests  that  it  is  sufficient  to  help  with  depth  discriminations  at  short  to  moderate 
distances.  Knight  et  al  suspected  that  reduced  resolution  was  the  major  contributor  to  this 
reduction  in  stereoacuity  (22).  Since  resolution  and  stereoacuity  are  both  spatial  discriminations 
they  should  be  highly  correlated,  but  the  relationship  is  complicated.  Unaided  stereoacuity  is 
more  vulnerable  to  degradations  in  image  quality  than  either  resolution  or  Vernier  acuity. 
Stereoacuity  has  been  shown  to  be  more  susceptible  to  reduced  luminance  (4,26),  reduced  view 
duration  (47),  defocus,  spatial  frequency  fdtering  (46)  and  reduced  contrast  (16).  Since  it  is 
impractical  to  measure  every  HMD’s  stereoacuity  under  every  possible  imaging  condition,  a 
predictive  model  of  HMD  stereoacuity  performance  is  needed  that  takes  these  additional  image 
quality  factors  into  consideration.  The  model  discussed  next  may  help  fulfill  this  need. 

Modulation  Transfer  Function  Area  for  Stereopsis 

In  many  aided  vision  applications  contrast  is  the  primary  limiting  factor  in  the  image. 

Laboratory  evaluations  of  HMD  resolution  are  often  misleading  as  the  intended  operational 
environments  do  not  typically  contain  the  high  levels  of  black  on  white  contrast  found  in  eye 
charts  or  test  patterns.  The  reduced  contrast  of  the  earth  tone  colors  found  in  natural  scenes  may 
be  further  reduced  by  camouflage  or  obscurants  such  as  smoke,  fog,  rain  or  dust.  Often  it  is 
reduced  scene  contrast  that  drives  the  desire  to  view  the  scene  in  a  different  spectral  band  and  the 
need  to  use  a  HMD  for  a  particular  application. 

Linear  systems  (i.e.,  Fourier)  analyses,  where  an  image  is  described  as  a  series  of  sine  wave 
components,  has  been  extremely  valuable  in  describing  the  relationship  between  spatial 
discrimination  and  contrast.  It  has  been  most  widely  used  in  the  analysis  of  resolution  in 
imaging  systems,  but  is  equally  relevant  to  other  spatial  discriminations,  including  stereoacuity. 
The  most  common  representation  of  linear  systems  based  performance  data  for  imaging  is  the 
modulation  transfer  function  (MTF).  In  MTF  measurements,  a  series  of  high  contrast  sine  wave 
gratings  with  varying  spatial  frequencies  are  imaged  through  the  optical  system.  For  each 
grating  frequency,  the  contrast  in  the  resulting  image  is  compared  to  the  contrast  in  the  original 
grating. 

Every  optical  system  suffers  from  imperfections  including  diffraction,  reflections,  aberrations, 
and  scatter,  which  deviate  light  from  its  intended  path  and  results  in  reduced  image  contrast. 

Poor  optical  systems  disperse  great  amounts  of  light  at  large  angles  which  causes  substantial 
contrast  loss  for  all  grating  frequencies,  while  good  quality  optical  systems  only  suffer 
disproportionate  amounts  of  contrast  loss  at  higher  frequencies.  MTFs  provide  a  graphical 
representation  of  measured  contrast  loss  across  a  range  of  grating  frequencies,  and  are  typically 
graphed  with  spatial  frequency  on  the  abscissa  and  contrast  on  the  ordinate,  giving  measured 
MTFs  a  characteristic  appearance:  the  curve  typically  slopes  down  and  to  the  right  as  it 
approaches  the  system's  frequency  cut-off,  which  defines  the  system's  resolution.  Figure  2  shows 
a  traditional  MTF  representation,  plotted  on  a  log-log  scaled  graph,  for  a  hypothetical  HMD. 

The  interpretation  of  the  MTF  curve  is  straightforward;  contrast  and  spatial  frequency 
combinations  above  and  to  the  right  of  the  curve  cannot  be  produced  by  the  optical  system  and 
are  unavailable  for  image  synthesis. 
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In  optical  systems  designed  for  human  viewing,  the  MTF  is  often  graphed  along  with  the  human 
eontrast  threshold  function  (HCTF)  for  grating  detection.  The  HCTF  is  defined  as  the  minimum 
amount  of  contrast  needed  for  a  typical  person  to  perceive  each  grating  frequency.  The  HCTF  is 
relatively  flat  for  low  spatial  frequeneies,  but  as  grating  frequeney  inereases,  approaching  the 
visual  system’s  frequency  limit,  more  contrast  is  required  to  support  pereeption.  The  absolute 
limit  or  resolution  is  reached  when  the  function  approaches  a  Miehelson  contrast  of  1 ,  a  grating 
of  dense  black  stripes  on  intense  white  baekground.  Interpretation  of  the  HCTF  is  clear;  contrast 
and  spatial  frequencies  below  and  to  the  right  of  the  curve  eannot  be  seen  through  the  system. 

As  portrayed  in  Figure  2,  most  HMD  systems  cannot  produce  the  visual  system’s  eut-off 
frequency;  the  HMD’s  MTF  starts  to  fall  earlier  and  crosses  the  HCTF  at  a  lower  frequency. 

This  intersection  of  the  eurves  establishes  the  eut-off  limit  for  a  human  using  the  HMD.  The 
eombined  Human  &  HMD  eut-off  frequency  is  analogous  to  visual  resolution  diseussed 
previously.  Resolution,  as  a  measure  of  an  optical  system's  ability  to  image  high  frequencies,  is 
a  simple  and  useful  tool  but  it  is  not  a  thorough  deseription  of  image  quality.  Alternatively, 
MTFs  provide  a  comprehensive  description  of  optical  performance  across  all  spatial  frequencies, 
but  is  cumbersome  in  practieal  application.  A  third  approach,  taking  the  area  under  the  MTF, 
attempts  to  split  the  difference  between  the  simple  and  eomprehensive  extremes  with  some 
success.  The  area  under  the  MTF  curve  provides  a  single  number  that  correlates  with  image 
quality,  but  lacks  the  intuitive  interpretability  that  a  familiar  measure  like  resolution  provides. 
Another  limitation  of  MTF  analysis  is  the  assumption  that  all  frequency  and  contrast 
combinations  are  equally  valuable;  obviously  incorreet  given  the  variable  sensitivity  of  the 
human  visual  system.  In  this  ease  subtracting  the  area  under  the  HCTF  from  the  MTF  area 
insures  that  the  most  usable  frequencies  are  given  more  weight  in  the  measurement.  This 
improved  measure  is  known  as  the  Modulation  Transfer  Function  Area  (MTF A)  (3). 


Spatial  Frequency 
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Figure  2.  Hypothetical  Linear  Systems  Analysis  for  an  Aided  Vision  HMD  System 


One  last  feature  of  linear  systems  analysis,  the  ability  to  easeade  MTFs,  makes  it  a  partieularly 
powerful  modeling  tool.  Caseading  MTFs  is  the  proeess  of  ealeulating  an  overall  system  MTF 
by  taking  the  product  of  the  component  MTFs.  For  example,  the  resolution  of  a  myopic  observer 
viewing  a  target  through  a  NVG  can  be  estimated  by  taking  the  product  of  the  MTFs  for  the 
target,  atmosphere,  NVG  and  the  observer’s  spectacle  lens.  This  cascaded  MTF  could  then  be 
analyzed  with  the  HCTF;  the  intersection  of  the  two  curves  provides  a  resolution  estimate  for  the 
particular  viewing  condition.  In  addition,  the  area  between  the  cascaded  MTF  and  HCTF  defines 
the  MTFA  for  the  situation,  providing  a  more  general  measure  of  the  expected  image  quality. 

Up  to  this  point  in  the  discussion,  linear  systems  analysis  and  the  consideration  of  human  vision 
has  assumed  a  single  2D  sine  wave  image  viewed  by  both  eyes,  but  the  techniques  are  equally 
valid  for  analyzing  stereo  image  pairs.  The  principles  of  Fourier  synthesis  and  analysis  remain 
and  the  concepts  of  contrast,  spatial  frequency,  cascading  MTFs,  and  HCTF  translate  directly  to 
stereo  systems  with  two  minor  complications.  First,  the  contrast  required  to  detect  subtle  spatial 
differences  in  a  stereo  image  pair  is  higher  than  the  contrast  required  to  detect  a  simple  2D 
grating;  consequently  a  linear  systems  model  for  stereopsis  requires  an  adjustment  to  the  HCTF. 
Second,  unlike  resolution,  the  relationship  between  cut-off  frequency  and  stereoacuity  is  unclear 
and  needs  to  be  explored. 

HCTF  is  often  described  as  a  single  function  relating  threshold  visual  detection  to  grating 
contrast  and  spatial  frequency.  It  is  more  appropriate  to  think  of  HCTF  as  a  series  of  functions 
dependent  on  many  variables  including  grating  parameters  (luminance,  duration  of  presentation, 
chromatic  content,  orientation,  motion),  observer  characteristics  (age,  health,  pupil  size,  adaptive 
state),  and  task  requirements  (detection,  discrimination,  identification).  A  huge  number  of 
psychophysical  experiments  have  been  conducted  to  explore  these  variables  and  their  impact  on 
the  HCTF.  Stereoscopic  tasks  have  been  used  in  many  of  these  experiments.  The  models 
describing  the  relationship  between  contrast,  spatial  frequency  and  stereoacuity  derived  from 
these  investigations  can  be  complex.  Here,  for  simplicity’s  sake,  it  is  sufficient  to  focus  on  one 
of  the  early  experiments  conducted  by  Frisby  and  Mayhew  (16).  They  used  random  dot 
stereograms  to  directly  compare  the  contrast  required  to  detect  dots  constructed  from  a  narrow 
band  of  frequencies  and  the  contrast  required  to  perceive  stereo  depth  encoded  in  the  same  dots. 
They  found  that  for  a  large  range  of  stereo  disparities  the  stereo  HCTF  had  the  same  shape  as  the 
simple  detection  HCTF  but  that  approximately  twice  as  much  contrast  was  required  to  support 
the  perception  of  stereoscopic  depth.  This  new  piece  of  information  can  be  used  to  build  a 
MTFA  model  for  stereopsis. 

In  Figure  3,  four  curves  are  plotted  together.  The  graph  is  similar  to  Figure  2,  only  the  axes  are 
linearly  scaled  to  improve  legibility.  The  first  curve,  NVG  MTF,  is  the  measured  MTF  of  a 
generation  3  night  vision  goggle  (17).  The  second  curve,  NVG  &  Target  MTF,  is  the  cascaded 
MTF  of  the  NVG  and  a  generic  target.  It  is  common  in  target  detection  modeling  to  assume 
uniform  contrast  across  all  target  spatial  frequencies  (43).  For  this  chart,  target  contrast  is 
assumed  to  be  .25,  a  typical  value  for  a  natural  target  and  background.  The  third  curve  is  a 
recommended  HCTF  for  an  adaptive  luminance  of  1.0  cd/m2  (3),  well  within  a  NVG’s  output 
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range.  The  fourth  curve,  Stereo  HCTF,  is  an  estimate  of  the  contrast  threshold  for  supporting 
stereopsis  calculated  by  doubling  the  HCTF  value  (consistent  with  the  findings  of  Frisby  and 
Mayhew).  These  four  curves  can  be  used  to  describe  the  spatial  frequency  bandwidth  that  is 
available  to  support  both  simple  detection  and  stereoscopic  tasks.  In  Figure  3,  point  (A)  is  both 
the  predicted  high  frequency  cut-off  and  resolution  for  a  person  using  NVGs  to  detect  a  high 
contrast  grating.  Point  (B)  is  the  predicted  high  frequency  cut-off  for  a  person  using  NVGs  to 
perceive  depth  in  a  high  contrast  stereo  grating.  Point  (C)  is  the  predicted  high  frequency  cut-off 
and  resolution  for  a  person  using  NVGs  to  view  a  typical  contrast  target.  Point  (D)  is  the 
predicted  high  frequency  cut-off  for  a  person  using  NVGs  to  perceive  stereoscopic  depth  in  a 
typical  contrast  target.  Line  (E)  is  the  low  frequency  cut-off  as  determined  by  target  size  and 
viewing  distance.  Moving  nearer  to  the  target  enlarges  its  angular  extent,  expanding  the  low 
spatial  frequencies  in  the  target  and  increasing  the  bandwidth  available  for  imaging.  Conversely, 
moving  farther  way  from  the  target  reduces  bandwidth  available  for  imaging.  The  striped  area 
(F)  is  the  MTFA  for  stereopsis  or  the  palette  of  spatial  frequencies  and  contrasts  that  are 
available  for  stereo  image  synthesis.  This  figure  shows  how  reducing  the  contrast  of  a  target  or 
increasing  the  viewing  distance  can  reduce  the  available  spatial  bandwidth  for  supporting 
stereopsis.  It  clearly  illustrates  how  an  NVG  can  perform  well  on  laboratory  resolution 
measurements  yet  fail  to  support  stereopsis  in  the  operational  environment  due  to  limited 
contrast.  Finally,  this  approach  also  defines  two  new  variables,  stereopsis  cut-off  frequency  and 
stereopsis  MTFA,  which  can  be  used  in  developing  models  for  predicting  stereoacuity  for 
binocular  HMDs. 


Figure  3.  Linear  Systems  Analysis  of  High  Contrast  Grating  and  Low  Contrast  Target 
Viewed  with  NVGs  (See  Text  for  Explanation) 


8 

Distribution  A:  Approved  for  public  release;  distribution  unlimited.  88ABW  Cleared  02/03/2014;  88ABW-20 14-0320. 


For  example,  one  simple  model  assumes  that  stereoaeuity  maintains  a  eonstant  proportional 
relationship  to  the  period  of  the  stereoscopie  eut-off  frequeney.  This  proportion  could  (and 
arguably  should)  be  expressed  as  a  simple  ratio;  although  this  would  obscure  the  abundance  of 
evidence  available  to  support  the  model.  In  the  linear  systems  context  a  portion  of  a  sine-wave’s 
period  is  reported  as  a  phase  angle  (complete  linear  systems  analysis  includes  frequency, 
amplitude  and  phase  analysis).  Despite  the  computational  inconvenience,  the  ratio-to-phase 
angle  conversion  is  advantageous  from  both  an  imaging  and  physiological  perspective  (33).  In 
stereo  imaging,  the  left  and  right  eye  images  contain  the  same  objects;  therefore  the  Fourier 
analysis  of  the  two  images  will  be  nearly  identical  in  terms  of  spatial  frequencies  and  amplitudes. 
The  binocular  disparities  that  drive  the  perception  of  depth  for  objects  in  the  scene  are  a  result  of 
local  phase  shifts  between  the  two  images.  The  local  phase  shifts  are  detected  by  visual  cortex 
cells  that  are  tuned  to  phase  shifts  between  binocular  retinal  images.  The  existence  of  these 
phase  sensitive  cortical  cells  has  been  verified  through  direct  cell  recordings  (1,39). 

Psychophysical  experiments  describe  the  perceptual  consequences  resulting  from  the  phase 
sensitive  cells  and  indeed  these  experiments  confirm  a  constant  phase  relationship  between 
stereoaeuity  and  cut-off  frequency,  for  low  to  moderate  spatial  frequencies  (8,32).  Legge  and  Gu 
(23)  generated  spatial  frequency  versus  stereoaeuity  curves  for  several  observers  and  found 
phase  angles  ranging  between  3  and  14  degrees.  Taking  the  most  conservative  phase  angle  from 
this  range  and  the  cut-off  frequency  from  Figure  3  (8  cycles/degree)  we  can  estimate  a 
stereoaeuity  for  a  person  using  Gen  3  NVG  and  looking  at  a  .25  contrast  target  to  be 
approximately  18  arcsec.  This  stereoaeuity  estimate  can  then  be  used  in  the  geometry  of 
stereopsis  model  to  estimate  the  minimum  perceptible  depth  for  a  given  viewing  distance  that 
can  be  attributed  to  binocularity. 

The  linear  systems  analysis  above  assumes  that  the  images  presented  to  each  eye  are  identical 
except  for  the  small  phase  shifts  resulting  from  their  separate  viewing  locations.  Asymmetries 
between  the  right  and  left  eye’s  image  in  size,  resolution,  focus,  contrast,  and  luminance  distort 
the  phase  shifts  and  reduce  stereopsis  (10,27,41).  If  stereopsis  is  a  priority,  considerable 
diligence  is  required  in  the  design,  development,  evaluation  and  manufacturing  of  binocular 
HMDs  to  ensure  that  the  right  and  left  channels  are  well-matched  within  the  tolerances  of  the 
human  visual  system. 

In  summary,  for  binocular  HMDs  the  maximal  benefit  from  stereo  is  at  near  and  the  rate  at  which 
the  benefit  is  lost  with  increasing  viewing  distance  depends  largely  on  the  HMD’s  image  quality. 
As  image  quality  improves,  the  usefulness  of  the  stereoscopic  cues  in  binocular  HMDs  will 
increase,  especially  for  moderate  to  long  distances.  Eventually  binocular  HMDs  will  support 
high-quality  stereopsis  at  relevant  distances  for  both  helicopter  and  fixed-wing  aircraft 
operations  to  include  taxi,  take-off,  landing,  terrain  following,  and  formation  flying. 

Probability  Summation 

Probability  Summation  of  Independent  Detectors  is  a  simplified  model  that  has  been  used  widely 
in  multichannel  sensory  research  (28,36,42).  For  binocular  vision  the  model  assumes  that  each 
eye  acts  as  an  independent  detector  and  that  the  component  processes  building  towards  the 
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response  are  stoehastie.  Given  these  assumptions,  the  model  prediets  for  a  variable  input 
(stimulus)  the  required  minimum  (threshold)  value  to  initiate  a  correct  dichotomous  output 
(response)  will  vary  over  time  and  that  if  the  threshold  is  measured  a  large  number  of  times,  the 
distribution  of  the  threshold  values  will  be  normal.  This  model  has  been  used  for  many  types  of 
stimuli  (spots,  gratings,  random-dot  patterns)  varied  on  a  wide  range  of  characteristics  (size, 
luminance,  contrast,  duration,  wavelength).  Likewise  many  types  of  behavior  can  be  used  as  the 
measured  response,  for  example  detection,  resolution,  identification,  and  comprehension. 

Normal  distributions  are  most  commonly  graphed  as  bell-shaped  functions,  but  an  equally  valid 
representation  is  the  S-shaped  cumulative  function.  In  this  case  the  cumulative  function  indicates 
the  probability  that  the  threshold  is  a  given  value  or  less.  The  cumulative  function  of  threshold 
data  is  used  regularly  in  psychophysical  research  where  it  is  referred  to  as  the  Psychometric 
Function.  In  vision  research  it  is  also  called  the  Frequency-of-Seeing  or  Probability-of-Seeing 
Curve.  Figure  4  shows  a  hypothetical  Probability-of-Seeing  Curve  for  a  standard  normal 
threshold  distribution  for  one  eye  and  also  the  probability  summation  model  for  the  two  eyes. 

The  model  is  straightforward.  The  two-eye  probability  of  seeing  curve  is  the  sum  of  the 
individual  eye  probabilities  minus  the  probability  that  both  eyes  see  the  stimulus: 

P(cum)  =  P(Eyei)  +  P(Eye2)  -  [P(Eyei)  X  P(Eye2)] 

The  probability  associated  with  seeing  the  stimulus  with  both  eyes  [P(Eyei)  X  P(Eye2)]  is 
included  in  both  P(Eyei)  and  P(Eye2)  and  therefore  gets  double  counted,  which  is  why  it  needs  to 
be  subtracted  once  in  the  calculation.  If  the  two  eyes  are  very  similar  the  equation  simplifies  to: 

P(cum)  =  2P(single  eye)-P(single  eye) 

Several  interesting  points  can  be  seen  in  Eigure  4.  If  a  stimulus  is  strong  or  suprathreshold,  for 
instance  2.5  or  more  standard  deviations  above  the  mean  threshold  value,  the  stimulus  would 
almost  certainly  be  seen  by  either  eye.  Stimuli  that  are  weak  or  subthreshold  have  very  little 
chance  of  being  seen  with  either  eye.  Consequently,  the  probability  summation  model  predicts 
that  probability  of  seeing  experiments  run  at  either  suprathreshold  or  subthreshold  levels  of 
intensity  have  little  chance  of  discriminating  between  the  binocular  and  monocular  conditions. 

If,  however,  the  stimulus  is  near  threshold,  two  independent  eyes  have  a  substantially  better 
probability  of  seeing  the  stimulus  compared  to  a  single  eye.  The  binocular  benefit  peaks  with  a 
.25  probability  improvement  at  the  mean  threshold  value.  (Eigure  4)  Analyzing  the  benefit  of 
binocularity  at  suprathreshold  levels  requires  a  different  strategy,  for  example,  one  that  involves 
the  measurement  of  reaction  time. 
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Figure  4,  Probability-of-Seeing  Curves  Showing  Predicted  Improvement  from  Probability 
Summation  due  to  Binocularity 


The  probability  summation  model  can  be  applied  to  reaction  times  based  on  the  previous 
assumptions,  specifically  two  independent  channels  built  from  stochastic  components.  Called 
the  race  model,  it  assumes  that  the  stimulus  input  races  through  the  separate  sensory  channels 
with  the  fastest  channel  initiating  the  response  (11 ,29,37).  In  the  race  model,  the  nature  of  the 
experimental  variables  are  reversed,  with  the  stimulus  being  dichotomous  and  the  dependent 
measure  of  reaction  time  being  continuous,  thus  allowing  the  race  model  to  be  used  across  the 
full  spectrum  of  suprathreshold  stimuli.  The  probability  summation  model  has  been  extremely 
useful  in  guiding  research  in  terms  of  experimental  design  and  interpretation  of  results. 
Subsequently,  several  decades  of  research  has  shown  that  the  probability  summation  model 
provides  a  conservative  estimate  of  the  performance  enhancement  derived  from  binocular  vision. 
For  the  majority  of  people  with  normal  binocular  vision,  probability  summation  underestimates 
visual  performance  because  the  two  eyes  work  together,  synergistically,  better  than  predicted  by 
their  individual  contributions. 

Binocular  Summation 

In  its  most  generic  use,  binocular  summation  simply  means  that  sensory  information  from  the 
left  and  right  eyes  are  combined,  resulting  in  an  improved  visual  perception.  Evidence  for 
binocular  summation  exists  when  visual  performance  exceeds  the  predictions  of  probability 
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summation  (5).  Binocular  vision  performance  improvements  are  most  evident  under  luminanee 
or  contrast  threshold  eonditions  where  the  probability-of-seeing  curves  for  binocular  detection, 
resolution  and  Vernier  acuity  clearly  exceed  both  measured  monoeular  visual  performance  and 
the  predieted  binoeular  performance  from  probability  summation  (2,5). 

In  addition  to  the  generic  usage,  binocular  summation  is  also  described  as  a  speeific  model  based 
on  signals  processing  theory  (SPT).  This  model  assumes  that  the  visual  signals  received  and 
processed  by  the  left  and  right  eye  channels  are  similar,  highly  correlated,  and  are  summed 
together  before  a  perceptual  determination  is  made.  Therefore,  the  resulting  binoeular  signal  (Sb) 
is  simply  the  sum  of  two  essentially  identieal  monoeular  signals  (Sm).  Alternatively,  the  noise 
produced  in  the  left  and  right  eye  channels  is  assumed  to  be  similar  in  magnitude,  stochastie  and 
uncorrelated  between  the  two  channels.  Aecording  to  SPT,  two  uneorrelated  noise  sources  are 
expected  to  sum  as  the  root-sum-of-squares  (9).  Given  these  assumptions  both  left  and  right 
channel  noise  ean  be  denoted  as  monocular  noise  (Nm)  and  an  equation  for  predicting  the 
improvement  in  signal-to-noise  ratio  for  the  binocular  condition  can  be  written  as: 
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This  model  is  eonsistent  with  retinal  physiology  where  deteetion  thresholds  are  heavily 
influenced  by  dark  noise  in  the  retinal  photoreeeptors  (19).  Similar  to  probability  summation, 
the  predietions  from  the  SPT  binocular  summation  model  can  be  graphed  on  a  probability-of- 
seeing  curve.  The  binoeular  summation  curve  is  equivalent  to  the  monocular  curve  shifted  by  a 
factor  (1/a/2)  to  the  left.  The  relationship  between  the  probability  summation  prediction  and  the 
binoeular  summation  prediction  depends  on  the  slope  of  the  monocular  curve;  the  steeper  the 
curve,  the  larger  the  separation  between  the  binoeular  summation  and  the  probability  summation 
predictions. 


Human  contrast  sensitivity  measurement  provides  an  obvious  approaeh  for  verifying  the  SPT 
binocular  summation  model.  The  task  of  identifying  a  luminance  increment  over  a  background 
is  conceptually  equivalent  to  recognition  of  a  signal  over  noise,  and  several  researchers  have 
demonstrated  a  consistent  a/2  relationship  between  monocular  and  binocular  performance  using 
this  experimental  paradigm  (7,9).  Figure  5  shows  the  results  from  one  subject,  on  a  contrast 
detection  experiment  eondueted  by  Legge,  replotted  on  a  probability-of-seeing  chart  along  with 
the  probability  summation  and  binocular  summation  predictions  (24).  Several  interesting 
observations  ean  be  made  from  this  figure.  The  binoeular  summation  curve  and  the  monoeular 
eurve  turn  up  at  similar  locations  but  the  ascent  for  the  binoeular  summation  eurve  is  more  rapid, 
resulting  in  a  steeper  slope  and  a  clear  separation  between  the  two  models’  predictions  near 
threshold.  The  measured  binocular  performance  and  the  two  models’  predictions  converge  on  a 
probability  of  one  long  before  the  measured  monocular  performance  asymptotes.  Most 
importantly,  the  subject’s  measured  binoeular  performanee  exceeds  both  models’  predietions  at 
lower  contrasts  but  comes  into  alignment  with  the  binocular  summation  curve  as  contrast 
increases. 
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Figure  5.  Measured  Monocular  and  Binocular  Probability-of-Seeing  Curves  (solid  lines) 
Plotted  with  Probability  Summation  and  Binocular  Summation  Model  Predictions 
(dashed/dotted  lines) 


Demonstrating  binocular  summation  at  suprathreshold  stimulus  levels  has  largely  depended  on 
reaction  time  experiments  with  binocular  summation  being  shown  if  reaction  times  are  better 
than  the  race  model  prediction.  Given  the  increased  signal  strength  predicted  by  binocular 
summation  and  the  well-known  positive  relationship  between  stimulus  strength  and  neural 
processing  speed,  it  is  not  surprising  that  binocular  summation  has  been  reliably  demonstrated 
using  this  approach  (7).  The  effect  is  robust  over  a  large  range  of  stimulus  variables  including, 
size,  eccentricity,  spatial  frequency,  contrast,  and  blur.  The  improvements  in  simple  reaction 
time  (button  push  or  button  release)  cluster  around  30  milliseconds  in  these  experiments 
(7,30,44).  The  percent  improvement  on  reaction  time  varies  by  experiment,  but  a  10%  reduction 
in  reaction  time  is  common  (6,7).  The  increased  speed  in  the  early  stages  of  visual  processing 
can  also  be  measured  electro-diagnostically  using  visual  evoked  potentials.  A  study  of  defocus 
effects  on  visual  processing  showed  a  2-8  millisecond  binocular  facilitation  at  the  level  of  the 
primary  visual  cortex,  with  the  8  millisecond  binocular  advantage  occurring  under  the  most 
degraded  viewing  condition  (40).  This  example  of  increasing  binocular  facilitation  under 
degraded  conditions  is  not  unique;  similar  findings  have  been  found  in  experiments  that  use  size, 
luminance,  contrast,  severity  of  distractors,  and  degree  of  eccentricity  as  independent  variables 
(30,44,45). 

The  SPT  model  enables  a  theoretical  comparison  of  the  monocular,  biocular,  and  binocular 
HMD  conditions.  Aided  vision  is  a  two-layered  sensing  system.  The  first  system  is  the 
electronic  HMD  system.  The  second  is  the  physiological  (eye/brain)  system.  The  SPT  binocular 
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summation  model  predicts  that  a  binocular  HMD  will  mitigate  noise  in  both  the  electronic  and 
physiologic  sensing  layers.  The  biocular  HMD  configuration  only  mitigates  noise  in  the 
physiologic  sensing  layer,  and  the  monocular  HMD  provides  no  noise  mitigation  in  either  layer. 
Night  vision  goggles  are  often  used  under  near-threshold  viewing  conditions.  The  sensors  in 
these  devices  are  limited  by  a  number  of  noise  sources  including  dark  current,  thermal  noise,  and 
optical  scatter,  which  are  largely  uncorrelated  between  the  left  and  right  channels  of  a  binocular 
device.  Given  these  sensor  characteristics,  binocular  summation  predicts  a  measurable 
improvement  in  performance  for  the  binocular  NVG  configuration  at  threshold  levels.  Very  little 
experimental  evidence  has  been  collected  to  confirm  or  refute  this  prediction.  Significant 
resources  are  required  to  conduct  rigorous  NVG  performance  measurements  including;  light 
sources  that  reproduce  the  spectral  radiance  of  the  night  sky,  low  light  level  photometers,  a 
reliable  psychophysical  method,  well-practiced  observers,  and  the  time  to  take  a  large  number  of 
measurements  (34,35).  Additionally,  binocular  summation,  like  stereopsis,  depends  on  similar 
images  being  provided  to  the  left  and  right  eyes.  Unfortunately,  the  complicated  sensor,  display, 
and  optical  designs  required  for  binocular  HMDs  make  building  nearly  identical  displays  for  the 
two  eyes  difficult.  If  the  inevitable  differences  between  the  two  image  channels  exceed  human 
visual  tolerances,  the  advantages  of  binocularity  are  lost. 

Discussion 

Recent  advances  in  microsensors,  microdisplays,  and  microprocessors  are  creating  new 
technology  options  for  aided  vision  HMDs.  The  ability  to  see  in  previously  inaccessible  parts  of 
the  electromagnetic  spectrum,  along  with  the  ability  to  overlay  supplemental  information,  will 
create  innumerable  opportunities  for  successfully  navigating  difficult  visual  environments.  As 
new  HMD  systems  are  developed  the  cost,  power,  and  weight  of  providing  a  second  sensor  will 
have  to  be  balanced  against  the  expected  benefits  of  binocularity. 

Heretofore,  the  value  of  binocularity  in  aided  vision  has  been  controversial  with  much  attention 
being  given  to  a  1989  study  by  Wiley  (48)  comparing  the  stereoacuity  obtained  with  binocular, 
biocular,  and  monocular  NVGs  which  failed  to  demonstrate  a  clear  performance  advantage  for 
the  binocular  configuration.  This  study  has  been  used  to  support  the  conclusion  that  binocular 
HMDs  have  very  limited  value  (38).  Close  examination  of  Wiley’s  study  reveals  that  this 
conclusion  may  be  premature.  Considering  the  small  number  of  measurements,  the  strict  />-value 
and  the  use  of  now  antiquated  night  vision  devices,  Wiley’s  study  should  not  be  considered  a 
definitive  declaration  regarding  the  value  of  the  binocular  HMD  configuration.  A  subsequent 
evaluation  by  Knight  et  al,  using  Gen  3  NVG’s,  concluded  that  binocular  NVGs  do  support 
stereopsis,  but  at  a  reduced  stereoacuity  level  (22).  Additional  experimental  work  is  needed.  A 
significant  obstacle  to  accomplishing  this  research  is  the  difficulty  in  building  a  comparable  set 
of  monocular,  biocular,  and  binocular  HMDs,  where  the  binocular  aspects  of  performance  are 
not  confounded  by  other  variables  such  as  luminance,  magnification,  or  resolution.  But  as  HMD 
technologies  improve  the  desire  and  opportunity  to  assess  the  value  of  binocularity  in  aided 
vision  systems  will  increase. 

Experimental  evidence  is  starting  to  demonstrate  the  value  of  binocularity  in  aided  vision,  at 
least  for  the  limited  visual  range  used  in  walking.  Binocularity  has  been  shown  to  improve 
walking  speed  over  an  obstacle  course  by  10%  relative  to  the  monocular  condition  (18).  The 
Army  Research  lab  conducted  an  analogous  walking  experiment  utilizing  binocular,  biocular. 
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and  monocular,  Gen  3  NVGs  and  found  a  very  similar  advantage  for  the  binoeular  eondition 
(14,15).  Another  notable  experiment  estimated  target  deteetion  ranges  for  binoeular  and 
monoeular  NVGs.  Resolution  measurements  were  obtained  under  well-eontrolled  eonditions 
using  several  levels  of  ambient  illumination.  The  results  of  the  measurements  were  then  used 
along  with  the  Johnson  Criteria  to  ereate  deteetion  models  for  military-relevant  targets.  The 
model  for  a  human  target  under  1/4  moon  illumination  estimated  a  deteetion  range  of  372  meters 
for  the  binocular  night  vision  goggle  eondition,  versus  340  meters  for  the  monocular  goggle  (31). 
Given  the  effeetive  range  of  individual  weapons,  the  32  meter  (approx.  10%)  improvement  in 
deteetion  distanee  eould  be  eritieally  important. 

Conclusion 

In  theory,  the  unaided  binoeular  vision  benefit  should  transfer  to  the  aided  vision,  HMD 
environment.  Binoeularity  in  human  vision  provides  performanee  benefits  aeross  a  wide 
speetrum  of  human  behavior.  This  behavior  is  a  result  of  at  least  two  parallel  neural  streams:  A 
eomparative  stream  that  extraets  depth  information  from  the  binoeular  image,  and  a  summation 
stream  that  eombines  the  image  information  resulting  in  improved  pereeption.  A  eomprehensive 
assessment  of  the  advantages  of  binoeularity  has  to  take  both  streams  into  eonsideration.  A 
thorough  assessment  of  binoeularity  is  eomplieated  by  large  variability  in  human  performanee. 
Consequently,  well-designed  and  rigorously  exeeuted  experiments  are  neeessary  to  reliably 
measure  the  benefits  of  binoeularity.  Models,  speeifieally  the  geometry  of  stereopsis,  linear 
systems  analysis,  probability  summation,  and  binoeular  summation  models  of  vision,  have  been 
extremely  helpful  in  designing  the  experiments  that  ultimately  demonstrated  the  performanee 
advantages  provided  by  binoeular  vision. 

These  models  should  likewise  be  useful  for  predicting  aided  vision  performanee  and  for 
designing  experiments  to  validate  those  predietions.  A  basie  predietion  from  the  geometry  of 
stereopsis  and  MTFA  models  is  that  improvements  in  image  quality  will  inerease  the  viewing 
distanees  over  which  binoeularity  contributes  to  depth  pereeption.  The  probability  and  binoeular 
summation  models  predict  substantial  improvements  in  visual  performanee  under  near-threshold 
viewing  eonditions.  They  also  prediet  that  reaetion  times  for  suprathreshold  tasks  should  be 
slightly  improved  for  binoeular  HMDs.  Whether  these  expeeted  binoeular  HMD  benefits  are 
meaningful  depends  on  the  speeific  aided  vision  application. 
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